Web Page Ranking Based on Text Substance of Linked Pages

نویسندگان

  • SWATI KUMARI
  • ASHOK SHAKYA
چکیده

World Wide Web is large sized repository of interlinked hypertext documents accessed via the Internet. Web may contain text, images, video, and other multimedia data. The user navigates through this using hyperlink. Search Engine gives millions of results and applies Web mining techniques to order the results. The sorted order of search results is obtained by applying some special algorithms called—Page ranking algorithms. The algorithm measures the importance of the pages by analyzing the number of inlinked and outlinked pages. Our proposed system is built on an idea that to rank the relevant pages higher in the retrieved document set, an analysis of both page‘s text substance and links information is required. The proposed approach is based on the assumption that the effective weight of a term in a page is computed by adding the weight of a term in the current page and additional weight of the term in the linked pages. In this chapter, we first study the nature of web pages, the various link analysis ranking algorithms and their limitations and then show the comparative analysis of the ranking scores obtained through these approaches with our new suggested ranking

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

A New Hybrid Method for Web Pages Ranking in Search Engines

There are many algorithms for optimizing the search engine results, ranking takes place according to one or more parameters such as; Backward Links, Forward Links, Content, click through rate and etc. The quality and performance of these algorithms depend on the listed parameters. The ranking is one of the most important components of the search engine that represents the degree of the vitality...

متن کامل

مدل جدیدی برای جستجوی عبارت بر اساس کمینه جابه‌جایی وزن‌دار

Finding high-quality web pages is one of the most important tasks of search engines. The relevance between the documents found and the query searched depends on the user observation and increases the complexity of ranking algorithms. The other issue is that users often explore just the first 10 to 20 results while millions of pages related to a query may exist. So search engines have to use sui...

متن کامل

Data Extraction using Content-Based Handles

In this paper, we present an approach and a visual tool, called HWrap (Handle Based Wrapper), for creating web wrappers to extract data records from web pages. In our approach, we mainly rely on the visible page content to identify data regions on a web page. In our extraction algorithm, we inspired by the way a human user scans the page content for specific data. In particular, we use text fea...

متن کامل

Ignoring Irrelevant Pages in Weighted PageRank Algorithm using Text Content of the Target PageIgnoring Irrelevant Pages in Weighted PageRank Algorithm using Text Content of the Target Page

The web is expanding day-by-day and people generally rely on search engines to explore the web. The web has created many challenges for information retrieval. Degree of quality of the information extracted is one of the major issue to be taken care of, and current information retrieval approaches need to be modified to meet such challenges. While doing query based searching, the search engines ...

متن کامل

Ranking Entities in the Age of Two Webs, an Application to Semantic Snippets

von Mazen Alsarem: The advances of the Linked Open Data (LOD) initiative are giving rise to a more structured Web of data. Indeed, a few datasets act as hubs (e.g., DBpedia) connecting many other datasets. They also made possible new Web services for entity detection inside plain text (e.g., DBpedia Spotlight), thus allowing for new applications that can benefit from a combination of the Web of...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2014